Robert Turner, University of Sheffield RSE Team September, 2021
Contains elements from Reproducible Research Data and Project Management in R, by Anna Krystalli and from Methods in Research Software Engineering by David Wilby.
Mix of software engineering and research experience.
13 RSEs, 35 projects / year worth ~£11m total
Focusses on what to do, not how to do it.
What are the characteristics of well engineered research software?
Link to interactive doc
Act as though every short term study will become a long term one @tomjwebb. Needs to be reproducible in 3, 20, 100 yrs
— Oceans Initiative (@oceansresearch) January 16, 2015
Take initiative & responsibility. Think long term.
@tomjwebb stay away from excel at all costs?
— Timothée Poisot (@tpoi) January 16, 2015
Do you agree?
THRILLED by this announcement by the Human Gene Nomenclature Committee. pic.twitter.com/BqLIOMm69d
— Janna Hutz (@jannahutz) August 4, 2020
But good for data viewing / entry, sometimes, perhaps…
@tomjwebb Entering via a database management system (e.g., Access, Filemaker) can make entry easier & help prevent data entry errors @tpoi
— Ethan White (@ethanwhite) January 16, 2015
@ethanwhite +1 Enforcing data types, options from selection etc, just some useful things a DB gives you, if you turn them on @tomjwebb @tpoi
— Gavin Simpson (@ucfagls) January 16, 2015
@tomjwebb it also prevents a lot of different bad practices. It is possible to do some of this in Excel. @tpoi
— Ethan White (@ethanwhite) January 16, 2015
Have a look at the Data Carpentry SQL for Ecology lesson
.csv: comma separated values..tsv: tab separated values..txt: no formatting specified.@tomjwebb It has to be interoperability/openness - can I read your data with whatever I use, without having to convert it?
— Paul Swaddle (@paul_swaddle) January 16, 2015
more unusual formats will need instructions on use.
Andrea De Santis, unsplash.com
.csv or .tsv copy would need to be saved.Use good null values, missing values are a fact of life:
NA or NULL are also good options0. Avoid numbers like -999@tomjwebb don't, not even with a barge pole, not for one second, touch or otherwise edit the raw data files. Do any manipulations in script
— Gavin Simpson (@ucfagls) January 16, 2015
@tomjwebb @srsupp Keep one or a few good master data files (per data collection of interest), and code your formatting with good annotation.
— Desiree Narango (@DLNarango) January 16, 2015
Raw data are sacrosanct
Photo by Jon Moore, unsplash.com
Photo: Pexels CC0
main copy of files@tomjwebb Back it up
— Ben Bond-Lamberty (@BenBondLamberty) January 16, 2015
R